Ambiguous author query detection using crowdsourced digital library annotations

نویسندگان

  • Xiaoling Sun
  • Jasleen Kaur
  • Lino Possamai
  • Filippo Menczer
چکیده

The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Managing the Quality of Large-Scale Crowdsourcing

Crowdsourcing can be used to obtain relevance judgments needed for the evaluation of information retrieval systems. However, the quality of crowdsourced relevance judgments may be questionable; a substantial amount of workers appear to spam HITs in order to maximize their hourly wages, and workers may know less than expert annotators about the topic being queried. The task for the TREC 2011 Cro...

متن کامل

Using Genetic Programming to Evaluate the Impact of Social Network Analysis in Author Name Disambiguation

In digital libraries, which have become extremely popular in the scientific community, often people want to find publications by an author using the author name as a query. However, since authors may have many denominations and one denomination may refer to many authors, name searches may present ambiguous results. To tackle this problem, several studies have been developed. Recently the use of...

متن کامل

Ethnicity Sensitive Author Disambiguation Using Semi-supervised Learning

Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the ...

متن کامل

Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd.

The development of tools in computational pathology to assist physicians and biomedical scientists in the diagnosis of disease requires access to high-quality annotated images for algorithm learning and evaluation. Generating high-quality expert-derived annotations is time-consuming and expensive. We explore the use of crowdsourcing for rapidly obtaining annotations for two core tasks in com- p...

متن کامل

User Annotations as a Context for Related Document Search on the Web and Digital Libraries

s = disambig.pages.abstracts for abstract in abstracts do text = abstracts.shuffle.join(" ") graph = Graph.new(text) annot = Annotation.create(abstract) graph.activate(annot, weights) graph.spreadActivation() query = graph.topNodes results = ElasticSearch(query) cat = abstract.page.categories relevant = results.withCategory(cat)

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2013